Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: the CEMP and GPRO patents tracks
نویسندگان
چکیده
This paper presents the results of the BioCreative V.5 offline tasks related to the evaluation of the performance as well as assess progress made by strategies used for the automatic recognition of mentions of chemical names and gene in running text of medicinal chemistry patent abstracts. A total of 21 teams submitted results for at least one of these tasks. The CEMP (chemical entity mention in patents) task entailed the detection of chemical named entity mentions. A total of 14 teams submitted 56 runs. The top performing team reached an F-score of 0.90 with a precision of 0.88 and a recall of 0.93. The GPRO (gene and protein related object) task focused on the detection of mentions of gene and protein related objects. The 7 participating teams (30 runs) had to detect gene/protein mentions that could be linked to at least one biological database, such as SwissProt or EntrezGene. The best F-score, recall and precision in this task were of 0.79, 0.83 and 0.77, respectively.
منابع مشابه
Mining Patents with tmChem, GNormPlus and an Ensemble of Open Systems
The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extraction from patents. This manuscript describes our submissions to the CEMP (chemical named entity recognition) and GPRO (gene and related object identification) subtasks. Our CEMP submission is an ensemble of...
متن کاملDUTIR at the BioCreative V.5.BeCalm Tasks: A BLSTM-CRF Approach for Biomedical Entity Recognition in Patents
Patents contain the significant amount of information. Biomedical text mining has received much attention in patents recently, especially in the medicinal chemistry domain. The BioCreative V.5.BeCalm tasks focus on biomedical entities recognition in patents. This paper describes our method used to create our submissions to the Chemical Entity Mention recognition (CEMP) and Gene and Protein Rela...
متن کاملNeji: Recognition of Chemical and Gene Mentions in Patent Texts
The BioCreative V.5 challenge focused on the recognition of chemicals and gene mentions in medicinal chemistry patents. For participation in the chemical entity (CEMP) and gene and protein (GPRO) recognition tasks, we used the concept recognition framework Neji and applied a machine-learning strategy using a optimized feature set. Our best submissions achieved an F-score of 86.6% for the identi...
متن کاملCRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools
This paper relates to the two offline BioCreative V.5 Becalm tasks. The first challenge is CEMP, the recognition of chemical named entity mentions. The second challenge is GPRO, the recognition of gene and protein related objects in running text. We focus on training and optimizing state-of-the-art solutions for named entity tagging for CEMP and GPRO. Finally, we present CRFVoter, a two staged ...
متن کاملHTSZ_CEM System for Chemical Entity Mention Recognition in Patents
In this paper, a machine learning-based system was proposed for the challenge task of chemical entity mention recognition in patents (CEMP) in BioCreative V. The CEMP task was recognized as a sequence labeling problem and conditional random fields (CRF) were employed for it. Evaluation on the CEMP challenge corpus showed that our system (team 293) achieved a micro F-measure of 87.03%.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017